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On  the  Characterization  of  Q-superlinear  Convergence 
of  Quasi-Newton  Methods  for  Constrained  Optimization 

Abstract 

In  this  paper  we  present  a  short,  straightforward  and  self-contained  derivation  of  the 
Boggs-Tolle- W a n g  characterization  of  those  quasi-Newton  methods  for  equality  constrained 
optimization  which  produce  iterates  which  are  q-superlinearly  convergent. 
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1.  Introduction 


We  begin  by  considering  the  equality  constrained  optimization  problem 


(11) 


minimize  f  {x) 

subject  to  g{x)  —  0, 

where  /  :Rn  — *R  and  g:  Rn  — ♦R'"(m  <  n).  Along  with  problem  (1.1)  we  consider  the  Lagrangian 
<(*>M  =  /(*)  +  Xrj(a;),  a  local  solution  x,  and  its  associated  multiplier  X»  (i.e.  X,  is  such  that 
V*/(*.,X»)  =  0). 

On  occasion  we  will  denote  an  operator  evaluated  at  xk  or  x,  by  deleting  the  argument  but 
instead  using  the  subscript  k  or  *  as  the  case  may  be,  e.g.,  gt  =  g(x.)  or  fk=f{xk).  We  also 
denote  the  Hessian  of  the  Lagrangian  at  (x,,\,)  by  w,  (i.e.  wt  =  v,2/(z., X ,)). 

By  a  successive  quadratic  programming  (SQP)  quasi-Newton  method  for  problem  (1.1)  we 
mean  the  iterative  procedure 


z/t+i  =  +  **  k  =0,1,  ■■  ■ 

where  sk  solves  the  quadratic  program 

minimize  V / Is  +  %  sT  Bks 

subject  to  Vgifs  +  gk  —  0 


(1.2) 


(1.3) 


for  given  Bk . 

In  the  analysis  of  convergence  rates  for  the  SQP  method  the  following  assumptions  are  stan¬ 


dard: 

Al: 

A2 

A3: 

A4: 

A5: 


f ,  g  E  G2(D)  where  D  is  an  open  neighborhood  of  x,, 
xk  €D  and  {xk}  converges  to  x., 

V?(*)  has  full  rank  \fx€.D 
t]Tw,rj>  0  \/ ?;  7^0  such  that  — 
r]TBki]>0  \/  ??7^0  such  that  \/gki]  =  0  and  \/k 
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Assumption  A4  is  second  order  sufficiency  for  problem  (1.1)  and  Assumption  A5  is  second  order 
sufficiency  for  the  SQP  subproblem  (1.3).  It  follows  that  our  subproblem  is  convex,  has  a  unique 
solution  and  our  iterative  procedure  is  well-defined.  Moreover,  for  x£D  Assumption  A3  allows 
us  to  consider  the  projection  operator 

p{x)  =  i-vg(*)(v9(*)TV9(*rvg(*)T  ■  (1.4) 

Clearly  P{x )  projects  onto  the  null  space  of  \7g{x)T . 

Suppose  that  {^}  has  been  generated  by  the  SQP  method.  Boggs,  Tolle  and  Wang  [l] 
show  that,  under  the  assumption  that  the  convergence  of  {**}  to  x,  is  g-linear,  the  convergence 
will  also  be  g-superlinear 


if  and  only  if 


lim 

k  — >00 


IK+i-z.  II 
1 1  1 1 


=  0 


(1.5) 


\\Pk[Bk-w.}sk  || 

hm  - - - n -  =  0  . 

*- 0°  11**11 


(1.6) 


This  characterization  result  is  a  nice  extension  to  constrained  optimization  of  the  Dennis-More'  [2] 
characterization  for  unconstrained  optimization.  Recently,  Fontecilla,  Steihaug  and  Tapia  [4] 
derived  the  Boggs-Tolle-Wang  characterization  without  the  g -linear  convergence  assumption. 
Even  more  recently  Nocedal  and  Overton  [7]  also  derived  this  characterization  without  the  q- 
linear  convergence  assumption. 


The  following  statements  serve  to  motivate  the  present  work.  All  three  previous  derivations 
of  the  Boggs-Tolle-Wang  characterization  leave  something  to  be  desired.  The  Boggs,  Tolle  and 
Wang  [l]  derivation  is  neither  short  nor  direct  and  uses  the  unnecessary  assumption  of  g-linear 
convergence;  however  we  emphasize  that  it  was  the  first  derivation.  The  Fontecilla,  Steihaug  and 
Tapia  [4]  derivation  is  lengthy  and  not  direct.  This  is  to  be  expected  since  they  solve  a  more 
difficult  problem.  Specifically  they  obtain  the  Boggs-Tolle-Wang  characterization  as  a  special  case 
of  a  characterization  result  for  a  more  general  class  of  quasi-Newton  methods  than  those  con¬ 
sidered  here.  Members  of  their  class  need  not  give  iterates  which  satisfy  linearized  constraints. 
Nocedal  and  Overton  [7]  give  a  short  and  direct  derivation.  However,  their  derivation  is  based  on 
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an  existence  theorem  and  a  differentiation  formula  from  differential  geometry.  The  theorem  and 
the  formula  are  due  to  Goodman  [5]  and  are  nontrivial.  It  is  not  clear  how  their  derivation  could 
be  given  in  a  complete  manner  in  an  elementary  presentation.  The  derivation  of  the  Boggs- 
Tolle-Wang  characterization  was  not  the  principal  issue  of  these  latter  two  papers. 

In  Section  2  we  present  several  formulations  which  are  equivalent  to  the  SQP  quasi-Newton 
formulation.  In  Section  3  we  use  one  of  these  equivalent  formulations  and  the  Dennis-More'  char¬ 
acterization  to  derive  the  Boggs-Tolle-Wang  characterization.  Concluding  remarks  are  given  in 
Section  4. 

2.  Formulations  Equivalent  to  SQP 

The  material  in  this  section  is  taken  from  Tapia  [10].  The  reader  interested  in  motivation 
and  further  detail  is  referred  to  that  paper. 

Extended  System  Formulation 

If  we  apply  the  first  order  necessary  conditions  to  the  quadratic  programs  (1.3)  we  see  that 
the  SQP  quasi-Newton  step  s  and  its  associated  multiplier  X  can  be  obtained  from  the  following 
linear  system: 

+  V5*k  ==  ~Vfk  (2.1a) 

V  =  ~  9k  ■  (2.1b) 

By  Assumption  A5  we  know  that  (1.3)  is  a  convex  program.  It  follows  that  in  this  case  the  first 
order  necessary  conditions  are  also  sufficient  conditions.  Also  from  A5  we  know  that  s  is  unique. 
This  means  that  the  quadratic  program  (1.3)  and  the  linear  system  (2.1)  determine  the  same  s 
and  it  is  necessarily  unique. 

Multiplier  Substitution  Formulation 

We  will  show  that  determining  s  from  (2.1)  is  equivalent  to  determining  s  from  the  linear 


system 


4 


\PkBk  +  S79k  V<7*]«  =  -  [Pt  V/jfc  +  V9k  9k]  (2.2) 

Toward  this  end  observe  that  if  we  define 

x'  =  ~{V9k  V <7* )~*  V 9k  {Bk s  +  vfk),  (2.3) 

then  we  can  write 

Pk  [Bks  +  V/*]  =  Bks  +  v/i  +  Vff/t  X'  .  (2.4) 

Suppose  that  s  has  been  obtained  from  (2.1).  Multiplying  (2.1a)  by  Pk,  recalling  that  =  0 

and  using  (2.1b)  we  see  that  s  satisfies  (2.2).  Now,  suppose  s  satisfies  (2.2).  Multiplying  (2.2)  by 
V<7aT  ar*d  recalling  that  Vfft'P*  —  0  we  see  that  (2.1b)  is  satisfied.  It  follows  that  the  left-hand 
side  of  (2.4)  is  zero;  hence  the  right-hand  side  of  (2.4)  is  zero.  This  means  that  (s,X')  is  the 
unique  solution  of  (2.1). 

3.  Derivation  of  the  Boggs-Tolle-Wang  Characterization 

We  begin  with  several  simple  observations.  If  P  is  given  by  (1.4)  then 

P(*)v/(*)  =  V/(*)  +  Vf(*)X(*)  (3.1) 

where 

M*)  =  -(Vffc^VfOOrVf  (*)TV/(*)  •  (3.2) 

It  follows  from  (3.1)  and  (3.2)  that  if 

F(x)  =  P{x)s7f{x)  +  \7g{x)g[x)t  (3.3) 

then 

F’{x.)  =  P.w,  +  \79*V9?  ■  (3.4) 

Thus,  we  can  interpret  (2.2)  (and  therefore  SQP)  as  a  quasi-Newton  method  applied  to  the  non¬ 
linear  system  F(z)  =  0  where  F  is  given  by  (3.3)  and  the  approximation  to  the  Jacobian  F ' (xk)  is 
given  by  PkBk  +V9kV9k'-  Moreover,  if  F'(x,)  is  singular,  then  by  the  equivalence  between  (2.1) 
and  (2.2)  it  follows  that  the  matrix 


B,  V?* 
0 


(3.5) 
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is  singular.  This  in  turn  implies  that  the  quadratic  program  (1.3)  with  xk  =  x,  and  Bk  =  w,  does 
not  have  a  unique  solution.  This  statement  contradicts  Assumptions  A3  and  A4. 

Now,  since  F  £Cl(D)  and  F '  (x,)  is  nonsingular,  the  Dennis-More'  [2]  characterization 
applies  and  (2.2)  (therefore  SQP)  generates  iterates  which  are  g-superlinearly  convergent  if  and 
only  if 

lim  \  \[PkBk+\79kV9k'-(P'W.  +  vg*Vg?)}sk\\/\\sk\\  =  0.  (3.6) 

Finally,  by  adding  and  subtracting  Pkw,  in  (3.6)  we  see  that  (3.6)  is  equivalent  to  the  Boggs- 
Tolle-Wang  condition  (1.6). 

4.  Summary  and  Concluding  Remarks 

In  this  note  we  have  presented  what  we  consider  to  be  a  short,  direct  and  self-contained 
derivation  of  the  Boggs-Tolle-Wang  characterization  of  g-superlinear  convergence  for  quasi- 
Newton  methods  for  constrained  optimization.  While  we  have  stated  that  the  three  previous 
derivations  (Boggs,  Tolle  and  Wang  [l] ;  Fontecilla,  Steihaug  and  Tapia  [4]  and  Nocedal  and  Over- 
ton  [7])  leave  something  to  be  desired,  we  quickly  add  that  the  present  work  was  strongly 
influenced  by  these  three  papers.  Indeed  the  basic  idea  that  led  to  the  present  derivation  was  to 
attempt  to  parallel  the  Nocedal-Overton  derivation  using  a  formulation  of  the  quasi-Newton 
method  which  possessed  the  attribute  that  all  necessary  differentiations  could  be  obtained  in  a 
straightforward  manner.  As  we  have  seen,  one  of  the  formulations  suggested  by  Tapia  [10] 
possesses  this  property. 

The  authors  acknowledge  comments  made  on  an  earlier  draft  of  this  paper  by  R.H.  Byrd, 
J.E.  Dennis,  H.  Martinez,  J.J.  More",  T.  Steihaug,  and  especially  M.  Overton. 
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